# High-precision speech recognition
Parakeet Rnnt 1.1b
Parakeet RNNT 1.1B is an automatic speech recognition model jointly developed by NVIDIA NeMo and Suno.ai, based on the FastConformer Transducer architecture with approximately 1.1 billion parameters, supporting English speech transcription.
Speech Recognition English
P
nvidia
13.18k
124
Stt En Fastconformer Transducer Xlarge
The NVIDIA FastConformer-Transducer is a high-performance model for English automatic speech recognition (ASR), utilizing an optimized FastConformer architecture and Transducer decoder with approximately 618 million parameters.
Speech Recognition English
S
nvidia
106
24
Stt En Fastconformer Ctc Xlarge
NVIDIA FastConformer-CTC XLarge is an Automatic Speech Recognition (ASR) model with approximately 600 million parameters, designed specifically for English speech transcription and trained using the FastConformer architecture and CTC loss.
Speech Recognition English
S
nvidia
216
2
Stt En Fastconformer Ctc Large
This is a large automatic speech recognition (ASR) model based on the FastConformer architecture, specifically designed for transcribing English speech into text.
Speech Recognition English
S
nvidia
1,001
12
Stt En Fastconformer Transducer Large
This is a large automatic speech recognition (ASR) model based on the FastConformer architecture, specifically designed for transcribing English speech into text.
Speech Recognition English
S
nvidia
1,398
7
Whisper Large V2 Japanese 5k Steps
Apache-2.0
A speech recognition model fine-tuned on the Japanese CommonVoice dataset based on OpenAI's whisper-large-v2 model, trained for 5000 steps with a word error rate of 0.7449
Speech Recognition
Transformers Japanese

W
clu-ling
144
20
Stt En Conformer Transducer Xlarge
This is an Automatic Speech Recognition (ASR) model developed by NVIDIA, based on the Conformer-Transducer architecture, with approximately 600 million parameters, specifically designed for English speech transcription.
Speech Recognition English
S
nvidia
496
54
Asr Wav2vec2 Librispeech
Apache-2.0
This is an end-to-end automatic speech recognition system trained on the LibriSpeech dataset, combining the wav2vec 2.0 pre-trained model and CTC technology, excelling in English speech recognition tasks.
Speech Recognition English
A
speechbrain
1,667
9
Wav2vec2 Large 960h Lv60 Self With Wikipedia Lm
An automatic speech recognition (ASR) system based on Facebook's wav2vec2-large-960h-lv60-self model, improved with an enhanced Wikipedia language model
Speech Recognition
Transformers

W
gxbag
15
2
Wav2vec2 Conformer Rope Large 100h Ft
Apache-2.0
Wav2Vec2 Conformer model fine-tuned on 100 hours of Librispeech data, incorporating rotary position embedding technology
Speech Recognition
Transformers English

W
facebook
99
0
Wav2vec2 Conformer Rope Large 960h Ft
Apache-2.0
This model incorporates rotary position embedding technology, is pre-trained and fine-tuned on 960 hours of LibriSpeech data sampled at 16kHz, and is suitable for English speech recognition tasks.
Speech Recognition
Transformers English

W
facebook
22.02k
10
Wav2vec2 Conformer Rel Pos Large 960h Ft
Apache-2.0
A Wav2Vec2-Conformer model based on 16kHz sampled speech audio, using relative positional embedding technology, pre-trained and fine-tuned on 960 hours of Librispeech data
Speech Recognition
Transformers English

W
facebook
1,038
5
Wav2vec2 Large 960h Lv60 Self 4 Gram
Apache-2.0
Based on Facebook's Wav2Vec2-Large-960h-lv60-self model, enhanced with an English 4-gram language model to improve speech recognition accuracy
Speech Recognition English
W
patrickvonplaten
22
4
Wav2vec2 Base 960h 4 Gram
Apache-2.0
Based on Facebook's Wav2Vec2-Base-960h model, with an added English 4-gram language model to improve automatic speech recognition (ASR) accuracy.
Speech Recognition
Transformers English

W
patrickvonplaten
19
0
Stt En Conformer Ctc Large
This is a large automatic speech recognition (ASR) model based on the Conformer architecture, supporting English speech transcription and trained using CTC loss function.
Speech Recognition English
S
nvidia
3,740
24
Data2vec Audio Large 960h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 960 hours of LibriSpeech data, specifically optimized for automatic speech recognition tasks.
Speech Recognition
Transformers English

D
facebook
2,531
7
Iwslt Asr Wav2vec Large 4500h
A large-scale English automatic speech recognition model based on the Wav2Vec2 architecture, fine-tuned on 4500 hours of multi-source speech data, supporting decoding with a language model
Speech Recognition
Transformers English

I
nguyenvulebinh
27
2
Simpleoier Librispeech Asr Train Asr Conformer7 Wavlm Large Raw En Bpe5000 Sp
An automatic speech recognition (ASR) model trained on the ESPnet framework, using the Conformer architecture and the WavLM large pre-trained model, trained on the LibriSpeech dataset.
Speech Recognition English
S
espnet
66
1
Wavlm Libri Clean 100h Large
Automatic speech recognition model fine-tuned on the LIBRISPEECH_ASR - CLEAN dataset based on microsoft/wavlm-large
Speech Recognition
Transformers

W
patrickvonplaten
8,171
3
Personal Speech To Text Model
A personal speech-to-text model fine-tuned from facebook/wav2vec2-large-robust-ft-swbd-300h, optimized for specific accents.
Speech Recognition
Transformers

P
fractalego
75
6
Wav2vec2 Large 960h Lv60
Apache-2.0
Wav2Vec2 is a powerful speech recognition model that extracts features from raw audio through self-supervised learning and achieves high-performance speech recognition with limited labeled data.
Speech Recognition English
W
facebook
7,011
6
Wavlm Libri Clean 100h Base
An automatic speech recognition model fine-tuned on the LIBRISPEECH_ASR - CLEAN dataset based on microsoft/wavlm-base
Speech Recognition
Transformers

W
patrickvonplaten
6,515
1
Hubert Large Ls960 Ft
Apache-2.0
HuBERT-Large is a self-supervised speech representation learning model fine-tuned on 960 hours of LibriSpeech data for automatic speech recognition tasks.
Speech Recognition
Transformers English

H
facebook
776.27k
66
Wavlm Libri Clean 100h Base Plus
An automatic speech recognition model fine-tuned on the LIBRISPEECH_ASR - CLEAN dataset based on microsoft/wavlm-base-plus
Speech Recognition
Transformers

W
patrickvonplaten
126.17k
3
Wav2vec2 Base 960h
Apache-2.0
The Wav2Vec2 base model developed by Facebook, pre-trained and fine-tuned on 960 hours of LibriSpeech audio for English automatic speech recognition tasks.
Speech Recognition
Transformers English

W
facebook
2.1M
331
Wav2vec2 Large 960h Lv60 Self
Apache-2.0
The Wav2Vec2 large model developed by Facebook, pre-trained and fine-tuned on 960 hours of Libri-Light and Librispeech audio data, using self-training objectives, achieving SOTA results on the LibriSpeech test set.
Speech Recognition English
W
facebook
56.00k
146
Wav2vec2 Base 960h
Apache-2.0
Wav2Vec2 is a self-supervised learning-based speech recognition model developed by Facebook, trained on the LibriSpeech dataset, supporting English speech-to-text tasks.
Speech Recognition
Transformers English

W
tommy19970714
19
0
Hubert Xlarge Ls960 Ft
Apache-2.0
A fine-tuned HuBERT extra-large speech recognition model based on 960 hours of Librispeech data, achieving a WER of only 1.8 on the LibriSpeech test set
Speech Recognition
Transformers English

H
facebook
8,160
14
Data2vec Audio Base 960h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language processing. This model is a speech recognition model pre-trained and fine-tuned on 960 hours of LibriSpeech audio data.
Speech Recognition
Transformers English

D
facebook
10.61k
12
Featured Recommended AI Models